add kitsune.l10n app for handling content localization #6330
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
mozilla/sumo#2053
Notes
This PR introduces a new
kitsune.l10n
application into SUMO that, for now, automatically creates and manages machine translations of KB articles. It uses an LLM to create the machine translation and, if a prior approved translation exists, the machine translation is heavily influenced by that prior translation. This respect for prior contributor translations was the requirement that drove this approach.This new
kitsune.l10n
app is designed to be independent of the apps containing content that it localizes -- so far just thekitsune.wiki
app. So, in other words, thekitsune.wiki
app doesn't know anything about -- or, in other words, doesn't import anything from -- thekitsune.l10n
app. If thekitsune.l10n
app was removed, thekitsune.wiki
app would continue functioning as usual, just without automatically-generated machine translations.In general, the system is relatively simple and consists of two main components:
Both of the main components above call the same Celery task
handle_wiki_localization()
, which in turn, uses the following two core functions:create_machine_translations()
manage_existing_machine_translations()
Once the
handle_wiki_localization()
Celery task has started (in any Celery worker), it can not be run again (in any Celery worker) until it has finished. This is managed via a Postgres advisory lock, which must be acquired in order to start, and is released only upon normal completion or an exception. This is to prevent the possibility (although small) of creating duplicate machine translations, which could occur if two instances of the task run simultaneously.All of the settings for machine translations can be made via the Django admin, and any changes take immediate effect. Currently, machine translations can be restricted by locale and/or by KB article slug and/or by the group of the KB article approver and/or by the approval date/time.
By default, this
l10n
application is disabled.Local Testing
@akatsoulas @smithellis @emilghittasv -- All of you are already configured to impersonate the GKE dev service account, which provides access to the Vertex AI API.
gcloud auth application-default login --impersonate-service-account <gke-dev-sa-email>
-- I can send you the email to use via Slackkitsune
repo --cd ~/repos/kitsune
cp -pr ~/.config/gcloud ./gcloud
.env
file:GOOGLE_APPLICATION_CREDENTIALS=./gcloud/application_default_credentials.json
GOOGLE_CLOUD_PROJECT=moz-fx-sumo-nonprod
gemini-1.5-pro-002
Future Adjustments
create_machine_translation()
) raises an exception, any pending machine translations would be abandoned for the current run of thehandle_wiki_localization()
Celery task. Of course, that Celery task will be run again at the next heartbeat, so we'll try again later, but it's possible we may want to handle some exceptions in the future. The challenge is that it's difficult to tell from the source code what exceptions might be raised, so I think we can wait to see what Sentry events we get, if any, and decide then whether it makes sense to handle those. I should add that theinvoke()
method of LangChain's chat model already retries on common, recoverable API exceptions (it's currently configured to retry twice before giving up), so it may be that we don't need to add any LLM API exception handling at all, and we just allow exceptions to be raised and then reported by Sentry (which is the current approach).Infrastructure Configuration Needed
aiplatform.user
role to the stage and prod GKE service accounts via TerraformTODO
test_wiki.py
that cover the slug, date, and group filtering.SUMO L10n Bot
on the recent revisions page.